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A model of manual control during perspective scene viewing is presented, which com- 
bines the Crossover Model with a simplified model of perspective-scene viewing and visual- 
cue selection. The model is developed for a particular example task: an idealized constant- 
altitude task in which the operator controls longitudinal position in the presence of both 
longitudinal and pitch disturbances. An experiment is performed to develop and vali- 
date the model. The model corresponds closely with the experimental measurements, and 
identified model parameters are highly consistent with the visual cues available in the 
perspective scene. The modeling results indicate that operators used one visual cue for 
position control, and another visual cue for velocity control (lead generation). Additionally, 
operators responded more quickly to rotation (pitch) than translation (longitudinal). 


Nomenclature 

Dx longitudinal position of scene feature in world coordinates, eyeheights 

Dy lateral position of scene feature in world coordinates, eyeheights 

Dz vertical position of scene feature in world coordinates, eyeheights 

F x describing function model of operator control output to longitudinal position 

Fq describing function model of operator control output to pitch attitude 

F x describing function measurement of operator control to longitudinal position 

Fq describing function measurement of operator control to pitch attitude 

He controlled element dynamics 

Hp human operator dynamic element 

H x human operator dynamic element to longitudinal position 

Hq human operator dynamic element to pitch attitude 

Ih horizontal image coordinate 

I v vertical image coordinate 

K x gain parameter 

Kr y gain parameter for position 

Kp gain parameter for velocity 

L imaging device focal length 

P yy power spectral density of y(t) 

P yz cross power spectral density of y(t) and z(t) 

r remnant 

s Laplace transform variable 

SE standard error 

t time, s 

W[a,b] sensitivity parameter for visual cue A relative to state variable B 
X, x nonlinear, linearized operator longitudinal position 

V, y nonlinear, linearized operator lateral position 

Z, 2 nonlinear, linearized operator vertical position 

V vertical displacement of a scene feature in the image plane 
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AV vertical displacement between two scene features in the image plane 

H horizontal displacement of a scene feature in the image plane 

AH horizontal displacement between two scene features in the image plane 

S component of displacement along a line of splay in the image plane 

a angle of a line of splay, rad 

S control input 

X 2 chi-square function 

0, 0 nonlinear ized, linearized pitch attitude, rad 

<h, cp nonlinearized, linearized roll attitude, rad 

T, ip nonlinearized, linearized heading attitude, rad 

T, 7 nonlinear, linearized visual cue for position 

£?, /? nonlinear, linearized visual cue for velocity 

r time delay, s 

tq differential time advance for pitch attitude #, s 

ujn natural frequency of neuromuscular dynamics, rad/s 

Ov damping of neuromuscular dynamics 

ujl lead equalization break frequency, rad/s 

ujc crossover frequency, rad/s 

(pM phase margin, deg 


I. Introduction 

T his paper describes a model of manual control in which the operator is using a perspective display; a 
perspective display is a two-dimensional depiction of a three-dimensional scene. Two of the main models 
of manual control are the Crossover Model 1 (CM) and the Optimal Control Model 2 (OCM). Manual control 
models have been primarily developed using compensatory and pursuit displays, as opposed to perspective 
displays. 

A small number of researchers have extended the manual control methodologies by combining the OCM 
of the human operator with models of perspective-scene viewing. Grunwald and his colleagues have studied 
manual control extensively using perspective scenes for a variety of tasks and display types. 3-12 Zacharias 
developed general models of perspective-scene viewing using the OCM, 13,14 which have been applied to 
the analysis and design of simulator visual cues. 15,16 Wewerinke applied OCM techniques to examine the 
visual cues necessary for glideslope control in the landing task. 17, 18 These OCM approaches have typically 
represented the perspective scene viewing with a linear combination of the states, where the combinatorial 
weights are governed by the perspective scene features. ‘Measurements’ obtained from the perspective scene 
are used as inputs to a state estimator, which provides a reconstruction of the system state to be used by 
the operator to control the vehicle. 

The approach described in this paper is different than this previous work in two aspects. First, the CM, 
rather than the OCM, is used to represent the control actions of the human operator. Second, visual cues 
in the perspective scene are used as input to the model, rather than a reconstructed vehicle state. This is 
not to suggest that the human operator is not aware of the state of the vehicle - it is instead proposed that 
experienced operators learn to recognize the visual cues in the environment, or sight picture, that correspond 
to a particular desired vehicle condition. It is proposed that when such a nominal sight picture exists, that 
the operator uses the vehicle controls to maintain the nominal sight picture, rather than feeding back and 
controlling a full reconstruction of the vehicle state. This approach, and some of the research that inspired 
this approach, will be elaborated upon in the next section. 

Relatively little previous work has been done in which the CM has been used to describe tasks in which a 
perspective display is used. Johnson and Phatak 19 combined the CM with an analysis of the characteristics 
of one visual element in a particular perspective display, and showed that the identified parameters were 
consistent with the perspective scene characteristics. Mulder examined the variations in identified CM 
parameters resulting from variations in design of a tunnel-in-the-sky display, but did not directly model the 
perspective scene characteristics. 20 In this paper, a modeling methodology is described which combines a 
model of perspective-scene visual processing with the CM. An experiment which was performed to develop 
and validate the model is described. The results of the experiment are used to infer the visual-cue usage 
strategy of the operators. The work described in this paper represents a significant expansion of the previous 
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body of knowledge; the previous work by Johnson and Phatak 19 modeled the use of one visual cue, for one 
operator, and one display condition. In this paper, the modeling methodology is developed in a way that 
can be generally extended to any task or display. Then, the model is applied to a particular task, and 
experimental validation is done with multiple operators and display types. 

II. The Visual Cue Control Model 

In this section, the Visual Cue Control Model (VCCM) will be developed, first at a conceptual level, then 
in more detail for a particular task. The conceptual development consists of three parts: 1) discussion of 
other relevant research, 2) description of visual cues in the perspective scene, and 3) incorporation of visual 
cues into the CM. 

A. Previous Research 

In the field of ecological psychophysics, extensive research has been conducted regarding how humans perceive 
and use information in a visual scene to accomplish self-motion (Warren and Wertheim edited a comprehen- 
sive compilation of research in this field 21 ). The origins of ecological psychophysics date back to World War 
II, and research that James Gibson performed in an attempt to determine characteristics that could predict 
which pilot candidates would be successful. 22 After the end of the war, Gibson wrote about his experiences 
in a book that laid the foundations for the field of ecological psychophysics: 23 

Many tests were devised but none of them predicted a prospective flier’s success or failure 
at this task. Many suggestions for training were made but none of them made the performance 
substantially easier. Toward the end of the war it began to be evident to psychologists working on 
problems of aviation that the usual approach to the problem of depth-perception was incorrect. 
Experiments needed to be performed outdoors. The stimuli judged ought to those of a natural 
environment. 

In this spirit, many researchers have studied the visual cues that are useful to a pilot using visual 
references. One taxonomy that can be used to categorize visual cues is static versus dynamic. Static cues 
are those available in a discrete or momentary time step (Gibson used the term “momentary stimulation”). 
For example, the position of the horizon relative the the aircraft frame of reference of the pilot in the aircraft 
relate to the attitude of the aircraft. Linear perspective elements can inform the observer of depth and 
distance information; textures and spatial gradients of texture, as well as lighting and gradients of shading, 
specify distance and shape (and/or orientation) of surfaces in the world. Size of familiar objects is also a cue 
useful for determining depth. Discontinuities in both texture and linear elements can inform the observer of 
a discontinuity in depth, or an edge. 

Dynamic cues can be utilized as well. Optic flow is defined as the pattern of visual movement in visual 
stimulation resulting from movement of the observer. Gibson considered the characteristics of optic flow that 
resulted from aircraft path and velocity. 24,25 One characteristic of the optic flow commonly used by pilots 
is locating the point of optical expansion to determine the landing location of the aircraft on the ground. In 
fact, Langschwiech discussed this cue in his 1944 flight training handbook. 26 A pilot on approach can ensure 
that he or she will land at the desired location by adjusting the aircraft controls to keep the point of optical 
expansion on the threshold of the runway; specific knowledge of glidepath (in degrees) is not necessary to 
perform the task. 

This discussion has not been with the intent of providing an exhaustive or complete set of cues useful 
for visually-controlled flight; rather, it is intended to give examples of the many types of cues that can 
be identified and utilized. A visual cue is, in essence, any definable feature or characteristic of the visual 
scene. It can be the position or orientation of an element in the scene, a function of an area of the scene 
(such as a spatial gradient of texture elements). It can be a temporally specified features such the direction 
and/or magnitude of optic flow, or a discontinuity of optic flow, or the rate at which texture elements cross 
a boundary (such as the bottom edge of the scene). 

Detailed descriptions of particular cues will be given in considering the example task below; at this point, 
the development will focus on an arbitrary visual cue definition, and the methodology used to derive the 
characteristics necessary for incorporation in the CM. 
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B. Visual Cue Definition 

While the previous section provided qualitative descriptions of visual cues, which are useful to understand 
the concept of visual cues, a quantitative description of the visual cue is necessary for modeling. This 
requires quantitative description of the perspective scene; specifically a mathematical description of the 
transformations which govern the image formation. Factors that can affect the perspective scene include 
1 ) the locations of scene features, 2 ) the location and orientation of the imaging device (including vehicle 
state), and 3) the imaging device characteristics. 

The scene features descriptions are typically available relative to a fixed, or world, coordinate system. 
Another coordinate system fixed to the vehicle being controlled is useful to describe the motion of the 
vehicle (in which the imaging device is located) relative to this fixed coordinate system. Eqs. 30 through 32 
in Appendix A define the transformation from the location of a feature, imaging characteristics, and vehicle 
state, into image coordinates. The visual cue can be defined as a function of the image content. If, for 
example, we define a nonlinear visual cue A to be: 


A = Gimageih , I v ) = G wor id(L , X, F, Z, 0, <f>, T, D x ,Dy, D z ) (1) 

Gimagei*) represents an arbitrary function of the image-plane coordinates (J^,/*,). G wor id (•) represents 
the same function but expressed as a function of the focal length L, the position and orientation of the imaging 
device (X, F, Z, 0, <f>, T), and position of the scene feature D x , Dy, Dz- It is obtained by substituting the 
expressions for and I v found in Eqs. 30 and 31 into the function Gi mage (Ih, I v )- The parameters D x , 
Dy and Dz and L are fixed for a particular feature and imaging geometry; the remaining variables are 
the vehicle states. Note that the arbitrary function Gi ma ^ e ( # ) could be much more complex; it could be, 
for example, the summation of optical flow direction within certain boundaries in the image. The previous 
section provided many examples of possible visual cues. 

The transformation between these vehicle states and the image-plane coordinates is nonlinear. A linear 
relationship is desired for incorporation with the quasi-linear CM, since this model describes the linear 
input/output relationships of the human operator. If, for example, it is assumed that the operator is trying 
to control X (longitudinal position), the linearized visual cue for that state, A, can be defined as: 


dA 

dA/dX 


X=Xo,V=Vo,Z=Zo,©=©o,^=^o,^=^o 


( 2 ) 


where 


7 a 9A <9 A dA dA dA d A 

dA ~ dx dX + W dY + dz dZ + de dQ + M d * + m** 


( 3 ) 


In this definition of the linearized cue A, the differential of the nonlinear cue A is normalized with dA/dX 
to create one-to-one correspondence between the linearized cue A and the longitudinal position; as a result, 
A is expressed in units of longitudinal position. This was done to simplify incorporation into the CM. 

By defining some additional terms, these equations can be simplified to provide a visual cue that is a 
linear combination of the vehicle states. We define the Sensitivity Parameter, W[ a,y] > to be: 


W[ A jY ] 


dA/dY 

dA/dx 


X=X 0 ,V=Yo,Z=Z 0 ,©=©o, <£=<£(), 


( 4 ) 


Other sensitivity parameters are defined similarly: W[a,@] = {dA/d®)/(dA/dX)\ x =x 0 ,..., etc. We also 
define the linearized states x = dX, y = dY , etc. With these definitions, we can rewrite Eq. 3: 


A — x + FF[a,y] 2 / + FF[A,zp + bF[A ,©]# + bb[A,$]0 + (5) 

Given any definition of a nonlinear visual cue A, the expression for a linear visual cue A can be derived 
that is simply a weighted sum of the vehicle states. The weightings are the sensitivity parameters as defined 
in Eq. 4, and are a function of the visual cue definition. 
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C. Incorporation of Visual Cues into the Crossover Model 

Much of the development of the CM was done using compensatory displays, in which the error between the 
desired state and the actual state is displayed to the operator. The perspective-scene viewing situation is 
fundamentally different from the compensatory-display viewing situation in two ways: 

1) the perspective scene is affected by both controlled and uncontrolled states. 

2) the process by which the vehicle states and feature coordinates are transformed into image coordinates 
is a non-linear transformation. 



(a) 


(b) 


Figure 1. Single-loop manual control tasks with Compensatory (a) and Perspective (b) displays. 


These two control-display viewing situations are shown conceptually in Fig. 1. Fig. la depicts a 
single- loop manual control system with a compensatory display. 1 In the compensatory display, only the 
error is presented to the operator. Fig. lb depicts this single- loop control task accounting for perspective- 
scene viewing. In this case, the operator is performing regulation (or position-keeping) in the presence of 
disturbances, as opposed to tracking a commanded input. 

While the systems depicted in Fig. 1 represent simplifications of the physical elements, they do not provide 
a good framework for isolation of the operator characteristics. To describe the input/output characteristics 
of the human operator depicted in Fig. la, McRuer and his colleagues developed the CM, a quasi-linear 
model of the human operator, consisting of a describing function plus remnant to represent the input-output 
characteristics of the operator (Fig. 2a). The describing function consisted of two elements: 1) a generalized 
describing function form, and 2) a series of adjustment rules for the describing function parameters. McRuer 
and his colleagues found that the adjustment rules were generally functions of the forcing function, controlled 
element dynamics, and frequency, and to a lesser extent a function of time and manipulator characteristics. 1 

As previously discussed, the inputs to the model are visual cues. This visual cue feedback, combined with 
the mathematical representation for visual cues derived in the previous section, lead to the block diagram 
representation of the VCCM shown in Fig. 2b. While this block diagram shows only one controlled ( x ) and 
one uncontrolled ( 0 ) state, in general there will be a describing function for each state. The describing 
function relative to the controlled state(s) ( H x in Fig. 2b) will be similar to the CM describing function ( Hp 
in Fig. 2a); this is the outcome of the previously stated assumption. The describing function relative to the 
uncontrolled state(s) ( Hq in Fig. 2b) will be a function of the visual cue(s) chosen to accomplish the control 
task. 

Next the VCCM will be developed for a particular task that will later be used for experimental validation. 
The example derivation is presented at a level-of-detail that should be sufficient to allow the reader to apply 
the methodology to novel situations. 
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(a) 


(b) 


Figure 2. Equivalent block diagrams of the human operator in a manual control task, using Compensatory (a) 
and Perspective (b) displays. 


D. Example 

1. Task 

The task considered here is an idealized hover of a vehicle in the presence of disturbances. The only degrees 
of freedom allowed were longitudinal motion and pitch. The transfer functions representing the vehicle 
dynamics are taken to be: 

a:(a) = 5 ( S + 0.2) [,5(s) + / - (s)] (6) 

0=- g fe(s) (7) 

where S is the joystick displacement, x is the longitudinal position in units of eyeheights, and 0 is the 

pitch attitude in radians. f x is a disturbance to the longitudinal acceleration in units of eyeheights/s 2 , and 
fo is a disturbance in pitch rate in units of rad/s. Note that for this constant-altitude task, all distances are 
expressed relative to the altitude, or height, of the eyepoint of the operator. The lack of coupling between 
longitudinal position and pitch (as would occur in a helicopter) was a deliberate choice to simplify the 
modeling task. 

2. Visual Cue Definition and Sensitivity Parameter Derivation 

It is assumed that the operator finds visual cues that correlate with the desired state, in this case x. Given 
a nonlinear visual cue definition A, the linearized cue A can be expressed as (Eq. 5): 

X = x IT[a,©]# (8) 

where W^a,©] = ( dA/d@) / (dA/dX)\x=x 0 ,@=@ 0 • Note that the visual cue, A, is simply a linear combination 
of x and 6 ; the weighting between the states, W[a,©], specifies the relative contribution of 6 in the cue. 

3. Describing Function Model Form 

For the task described, the model form of the describing function will be developed assuming visual cues 
as inputs. However, it is necessary to first review the compensation the operator would adopt with a 
compensatory display, since the VC CM is based upon the assumption that the operator will adopt very 
similar compensation with a perspective display. 
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For the controlled-element dynamics in this example task (He = l/[s(s + 0.2)]), the CM would predict 
(in the region of the crossover frequency uj c ) that the product of the operator compensation Hp(s) and 
controlled element dynamics Hc(s) would be: 

fjj P ~ ST 

H P (s)Hc(s) « ^ ( 9 ) 

s 

Accounting for the fact that the human can probably not generate 5 seconds of lead compensation 27 
(necessary to cancel the pole at s = —0.2 in the vehicle dynamics), one would expect the operator dynamics 
to take the approximate form: 

H P (s) = ^ e- ST (s+co L ) (10) 

idL 

where ujl should occur at a frequency below crossover, and at or above 0.2 rad/s ( 0.2 < ujl < td c ). 

Figure 3 a shows a schematic diagram of this assumed compensation strategy. The transfer function 
between the control output and the controlled state would be: 

S(s) = —Hp(s)x(s) + r(s) (11) 

The term r(s) is included in this transfer function and in the diagram; this represents remnant “injected” 
by the human operator into the control activity. Specifically, it is the control activity that is not linearly 
correlated with the input. 




(a) 


(b) 


Figure 3. Block diagrams of human operator performing the example task, using compensatory (a) and 
perspective (b) displays. 


Next the case of the operator using visual cues from a perspective display will be considered. For the 
controlled element dynamics considered in this example, the operator is required to generate a significant 
amount of lead compensation; this can be accomplished by feeding back both position and velocity. It is 
assumed that the operator can, and possibly will, use different visual cues for position and velocity. The oper- 
ator could do this by using central, or foveal vision, to determine position of a cue, and parafoveal/peripheral 
vision to detect visual motion. We define 7 to be the linearized visual cue for position, and /3 to be the 
linearized visual cue for motion, as follows: 

j(s) = x(s) + W [r ,e]0(s) (12) 


0 {s) = *(*) + W [B ,e] 6 (s) ( 13 ) 

Note that as defined, the visual cues 7 and (3 are expressed in units of eyeheights, and the parameters 
defining the contribution of to the visual cue, VFp,©] and VF^©], are expressed in units of eyeheight/rad. 

Now the usage of these cues will be be examined with consideration of the expected operator compensation 
from the CM. Expressed in the time domain, Equations 10 and 11 become: 


m 


~ T ) + u L x(t - t)J + r(t) 


( 14 ) 
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Substituting f3 for x, and 7 for x, and K x for uj c /ujl we have: 


S(s) = -H x (s)x(s) - Hq(s)0(s) + r(s) 


(15) 


where 

H x (s) = K x e- ST (s + oo L ) (16) 


H e (s) = H x (s) 


(W[b,q]S + hh[r,©]^L) 
(s + LCl) 


(17) 


The model resulting from inclusion of visual cues from the perspective display contains two separate 
describing functions. The first one, H x , which is the operators’ response to the longitudinal position x, is 
identical in form to the describing function that would be predicted for a compensatory display. The second 
describing function, Hq , is a function of not only H x , but also the additional parameters Wpy©] an d W[b,q\- 
These parameters are expected to be functions of the perspective display. Experimentally identified values of 
the parameters will later be compared with theoretically expected values for cues available in the perspective 
scene. 

With the model form defined, the relationship between experimental measurements and describing func- 
tions can be derived. This is the subject of the next section. 


4- Describing Function Measurement 

For experimental validation, it is necessary to relate the models to measurements. Time histories of the 
input/output variables x, 0 , 6, f x , and fo can be used to generate power-spectral and cross-spectral densities. 
Equations 6 , 7, and 15 can be combined to produce the following relationships: 

p sfx = - HxHcPei x + Prf x / 18 x 

/’,■/, H c P fxfx - H x H c HgP efx + H c P rfx { > 

p Sfe = 1 -H. HcPuU - H,H(,Po Iv + P rfg 

P eh 1 + H X H C Pe fe 1 j 

The input signals f x and fo can be made to have zero correlation with each other, and with ensemble 
averaging of measurements, the correlation between the remnant and the input signals can be minimized. 
With these conditions, the relationships become: 


p 6fe _ 

Peu (1 + H X H C ) 

New terms will now be introduced to simplify later comparisons of the models and measurements 

( 22 ) 

(23) 

(24) 

(25) 

The terms F x and Fq denote the param- 

III. Experimental Validation 

A series of experiments was conducted to develop and validate the VCCM. The body of results from 
these experiments is too large to contain in this article; instead, a summary of the most significant results 
from one experiment are presented. Detailed descriptions of the experiments and results can be found in 
references 28 or 29. 


F x = 


F e = 


F Sf x 
Pxf x 
P Sfe 


F e = 


P 0f9 

p x = H x 

He 


(1 +H X H C ) 

The terms F x and Fq refer to the experimental measurements, 
eterized models which will be developed to fit the measurements. 


( 20 ) 

( 21 ) 
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A. Protocol 


A total of eight operators partici- 
pated in the experiment. The dis- 
play types used are shown in Fig. 

4. The display types are called Grid 
(a), Perpendicular (b), Parallel (c), 
and Line (d). These display types 
were chosen to produce differences 
in the visual cues available to the 
operator. Each was rendered with 
a graphical field-of-view of 60 de- 
grees (vertical) by 75 degrees (hor- 
izontal). Operators were told to 
control the longitudinal (fore-aft) 
position of the vehicle; they were 
also told that the only degrees of 
freedom were longitudinal and pitch 
displacement, and that other states 
would remain fixed. 

The experimental apparatus was 
a part-task simulation hosted on an 
SGI Octane computer. A BG Sys- 
tems JF3 joystick was used for the 
operator’s control inputs. A 19-inch 
diagonal monitor was used, with a 
resolution of 1024 (vertical) by 1280 
(horizontal) pixels. The display and 
joystick information were updated 
at a rate of 72 Hz. Each operator 

received extensive training runs before completing eight four-minute data runs with each display. Presen- 
tation order of the display conditions was counterbalanced between the operators. The time histories of 
the states x, #, the control input S, and the disturbances f x , and fo were used to estimate the describing 
functions F x and F#, and the measurement standard errors SE(F X ) and SE(Fq) . 30,31 



(c) 


(d) 


Figure 4. The four display configurations tested. Grid (a), Perpendic- 
ular (b), Parallel (c), and Line (d). 


B. Results 

Describing function measurements for F x and Fq were made using the methods previously described. Param- 
eterized models were fit to the measurements using a maximum- likelihood estimate through minimization of 
the chi-square function, 32 defined as: 


X 


2 


12 


E 


real[Fx(uH) - F x (wj)] 2 imag[F x (u)i) - F x (wj)] 2 
SE [real(F x (ui )\ 2 SE [imag{F x {^)} 2 


12 


+E 


real[Fo(u>i ) - F e (u>i)] 2 

SE[real(Fg{LOi )] 2 


imagjFe (m) - Fei^ijf 
SE \imag(Fo(ui )\ 2 


(26) 

(27) 


The previously derived models in Eqs. 16 and 17 were based on the original CM, which was developed 
to describe manual control behavior in a limited frequency range. McRuer and his colleagues found that 
when the measurements in the region outside of crossover were sufficiently accurate, that a more complex 
model, which they named the Precision Model, was necessary to describe the data in the regions above and 
below crossover. 33 A similar result was found in fitting the data from these experiments; matching the high- 
frequency measurements required the addition of two terms. One term was a second-order neuromuscular 
dynamics term; the other was a different, shorter time delay in processing the rotational motion than in 
processing the translational motion. The resulting models were: 
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( 28 ) 


tt ( .) _ K x e ST (s + w L ) 
x{) s 2 ui% + 2s( n con + 1 

H e (s) = H x (s)e ST ° (29) 

S+UJ L 

Two simpler versions of this model were also considered; a seven-parameter model was tested for which 
the pitch motion time advance tq was zero, and a six parameter model was tested in which the same 
visual cue was used for both position and velocity feedback (achieved by constraining if# = if 7 ). Each 
parameter addition resulted in significant decreases in y 2 . A rule of thumb is that when the chi-square 
value is approximately equal to the number of degrees of freedom (the number of measurements minus the 
number of parameters), the model is a moderately good fit to the data and further refinements are not 
warranted. 32 For this case, with 48 measurements for each operator and condition (two complex describing 
function measurements at twelve frequencies), a limiting value for model refinement would be 40.0. The 
fact that the average for the eight-parameter fit was nearly twice this amount would suggest that additional 
parameters could be warranted, but candidate ninth parameters did not result in better fits. The lack of 
further improvement is likely because the model was linear, and the perspective projection of the world 
coordinates into image coordinates is inherently nonlinear. 

When measured in terms of 
magnitude and phase, the errors be- 
tween the model and the measure- 
ments are modest. For the operator 
describing function to the longitudi- 
nal position, F x , the error between 
the model and measurement had a 
mean absolute value of 0.073 dB 
with a standard deviation of 1.41 
dB, and a mean phase of -0.021 de- 
grees with a standard deviation of 
18.9 degrees. For the operator de- 
scribing function to the longitudi- 
nal position, Fq , the error between 
the model and measurement had 
a mean absolute value of -0.27 dB 
with a standard deviation of 1.73 
dB, and a mean phase of -0.51 de- 
grees with a standard deviation of 
29.0 degrees. An example of the de- 
scribing function data and resulting 
model fit for one operator and con- 
dition are shown in Fig. 5. Note 
that the standard errors are much 
smaller for the measurement of F x 
than Fq ; this is a natural result of 
the fact that the operators are controlling position, not attitude. 

Table 1 contains the mean and standard error of the seven model parameters, as well as crossover 
frequency ( uoc ), phase margin (0 m), and y 2 , as a function of display type. A one-way ANalysis Of VAriance 
(ANOVA) was conducted on the model parameters, crossover frequency, and phase margin, to determine 
if the changes as a function of display type were statistically significant. The last two columns in Table 1 
contain the F value resulting from the ANOVA, and the estimated probability (p) that the observed effect 
of display is occurring from chance. Using a cutoff of p < 0.05 for determining statistical significance, if x , 
cy/v, Gv, if 7 , if/ 3 , t< 9 , l J c, and 0 m exhibited significant main effects from display type. Only the differences 
in time delay r and the lead break frequency ujl were not statistically significant. 

The variables if 7 , if#, tq , co^r, Gv, if 7 , if/ 3 , t#, cjc, and 0m are shown in Figure 6. The variable K x is 
not shown independently because the primary effect of it is observed in the crossover frequency ujc • Paired t 




Figure 5. Example plot showing comparison of model fit with measure- 
ments. The operator was using the Parallel display for this measure- 
ment; standard error bars are shown. % 2 for this model fit was 50.8. 
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Table 1. Model parameters. Values of p denoting statistical significance ( p < 0.05) are denoted with ** 


Display 


parameter 

Grid 

Parallel 

Perpendicular 

Line 

F 

p 

K x (s • eyeheight) 

96.8 ±66.9 

96.4 ±52.5 

73.1 ±42.9 

72.1 ±39.2 

5.75 

0.0050** 

ujl (rad/s) 

0.569 ±0.268 

0.565 ±0.273 

0.591 ±0.279 

0.505 ±0.227 

1.29 

0.3050 

r (s) 

0.250 ±0.025 

0.249 ± 0.024 

0.245 ±0.025 

0.249 ±0.027 

0.43 

0.7320 

cy/v (rad/s) 

6.71 ±1.87 

6.78 ± 1.49 

6.31 ± 1.41 

6.31 ± 1.31 

4.11 

0.0190** 

C n (rad/s) 

0.482 ± 0.091 

0.526 ±0.096 

0.483 ± 0.074 

0.537 ±0.117 

4.54 

0.0130** 

K 1 (s • eyeheight) 

8.32 ±3.96 

10.07 ±2.69 

10.12 ±2.70 

12.68 ±2.60 

4.56 

0.0130** 

Kp (s • eyeheight) 

3.07 ±1.69 

4.38 ± 1.57 

6.34 ± 1.21 

8.07 ±0.43 

59.01 

< 0.0001** 

Te (s) 

0.075 ±0.025 

0.055 ±0.023 

0.043 ±0.012 

0.022 ±0.009 

13.60 

< 0.0001** 

ujc (rad/s) 

2.010 ±0.401 

2.072 ±0.292 

1.814 ±0.340 

1.791 ±0.265 

8.38 

0.0010** 

4>m (deg) 

33.45 ±8.10 

30.68 ±6.55 

35.76 ±6.15 

36.46 ± 5.68 

3.22 

0.0440** 

X 2 

93.96 ±45.50 

74.48 ± 24.43 

89.73 ± 33.48 

75.13 ±43.06 

0.71 

0.5580 


tests were performed to determine whether differences observed between display conditions were statistically 
significant; the results are shown in Table 2. 


Table 2. Paired t-tests statistical significance ( p values) of the of the model parameters. The symbols #, _L, ||, 
and | are used to denote the grid, perpendicular, parallel, and line displays, respectively. As an example for 
the interpretation of the chart, the parameter uoc shows a significant difference between the grid and parallel 
display conditions, and the grid and line, but not the grid and perpendicular. Lack of statistical signicance is 
shown with i —\ 



C. Discussion 

1. Potential Visual Cues 

For the example task, five potential types of visual cues, illustrated in Fig. 7, were considered that could 
provide the relevant state information to the operator. The derivation of the sensitivity parameters for 
each cue is contained in Appendix B. Symbols, analytical expressions and numerical values of the sensitivity 
parameters for each of the visual cues are shown in Table 3. One would anticipate that operators would 
choose visual cues that minimize the impact of the uncontrolled pitch attitude perturbations; this would 
imply the selection of a cue or cues that minimize the sensitivity parameter. Another factor that needs to 
be considered is whether the cue can be effectively perceived by the operator. 
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Figure 6. Identified gain parameters K 7 and Kp (a), pitch time advance tq (b), crossover frequency uc and 
phase margin cf>M (c), and neuromuscular frequency ujjy and damping fjy, averaged across subjects. Standard 
error bars are shown. 


Table 3. Sensitivity parameters as a function of visual cue. All numerical values are expressed in units of 
rad/eyeheight. Display conditions in which the visual cue is present are denoted with an 4 x\ 


Visual Cue 


Sensitivity Parameter 



Display 



Symbol 

Expression 

Value 

Grid 

Perpendicular 

Parallel 

line 

V 

W[v,e\ 

D\ + 1 

> 10.0 

X 

X 

X 

X 

AV 

W[AV,@] 

1 

1.0 

X 

X 

X 

X 

H 

W[h,g\ 

1 

1.0 

X 

X 



AH 

W[ah,q\ 

1 

1.0 

X 

X 



S 

W\ S: Q] 

{d$ + d 2 x + i)/04 + i) 

> 2.6 

X 

X 
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First consider H, the vertical displacement 
of a single feature in the image. The sensi- 
tivity parameter of this cue is a function of 
the position of the scene feature, specifically 
W[v,&] = + 1 eyeheight/rad. This param- 

eter will be minimized by minimizing dj, or 
using the feature closest to the observer. The 
minimum sensitivity parameter for this cue is 
W[v,@] = 10-0 (when using the closest feature 
at Dx = 3.0 eyeheights). This will be shown to 
be a relatively poor cue in relation to the other 
cues analyzed. However, this cue is easily per- 
ceived because 1 ) the line is clearly visible, and 
2 ) the position of the line is being judged rel- 
ative to a fixed image location (i.e. the screen 
edge). The visual cue V is available in all of 
the display configurations. The possible val- 
ues of W[v,@] are shown as a function of image 
location in Figure 8 a. 

Next consider the characteristics of S', the 
component of the displacement of a feature 
parallel to the lines-of- splay. The expression for the sensitivity parameter is W[s,e] = (dy +D\ + l)/{Dl + l) 
eyeheight/rad. The value of W[s,q] will be minimized by minimizing dx, and maximizing Dy , which is 
achieved at the lower corners of the display, as shown in Figure 8 b. For the features present in the displays, 
the lowest achievable value of W[s,@] is 2.6 eyeheight/rad. This is a superior cue to the absolute vertical 
displacement cue W[h,g] analyzed previously. It is anticipated that this cue would be used as a motion cue 
rather than a position cue by detecting motion in the direction of a line of splay. This cue is present in the 
Grid and Perpendicular displays, not the Parallel or Line displays. 

The remaining three cues to consider, id, Aid, and AH, have a sensitivity parameter value of unity 
at every location in the display (W[h,q\ = W[xh,@\ = W/av,©] = 1 eyeheight/rad). To review, id is the 
horizontal displacement of a single feature in the image. Aid and AH are, respectively, horizontal and 
vertical displacements between two scene features in the image. While a sensitivity parameter magnitude 
of unity is superior to the previously considered cues, these three cues may be of limited practical usage. 
First consider the two cues related to horizontal judgements, either the absolute horizontal displacement of 
a feature id or relative horizontal displacement between features Aid. Since the task is longitudinal position 
control, and the control effector moves in the longitudinal plane, there might be some practical difficulty 
in using a horizontal (i.e. lateral) displacement cue because of the lack of intuitive mapping between the 
cues; at best, one would expect control reversals to occur. Now consider the cue related to the relative 
vertical displacement between features (AH). Although this cue potentially has a better intuitive mapping 
to longitudinal position, one problem that might be encountered using this cue is the fact that the two 
features would both be moving because of the pitch motion. The constant motion of both features could 
make this cue difficult to perceive. The cue using relative horizontal displacements (Aid) could be difficult 
to use for this reason as well. 

In summary, three potential visual cues, id, Aid, and AH, have the best theoretical characteristics, but 
perceiving them accurately could be difficult. The worst cue to use would be H, although it is probably 
easily perceived. S would be a potentially good cue for motion perception, particularly if the best cues are 
difficult to perceive. 


Figure 7. Examples of available visual cues in the displays 
tested. 



2. Visual Cue Identification 

Now we will consider the pilot model parameters (id 7 and Kp) that correspond to the two hypothetical 
visual cues (/? for motion, 7 for position) in the VCCM. Identified values of K 1 and Kp are shown in Table 
1 and Figure 6 a; the plot also shows the predicted values of the sensitivity parameters for the potential 
visual cues analyzed. The differences in these parameters were statistically significant, with probabilities of 
the differences occurring by chance of p = 0.013 and p < .0001 for K 1 and Kp, respectively. The identified 
parameters generally fall within the expected range for the available visual cues. In no case did an identified 
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(a) 


(b) 


Figure 8. Direction of image motion resulting from longitudinal displacement (a) and pitch displacement (b). 
Nominal position of a location 3.0 eyeheights in front of the observer is shown with a dotted line. 





parameter go below unity, and most of the parameters are between 1 and 10 eyeheight/rad. Also note that 
the changes in display type produced systematic changes in the identified parameters. 

Now let us examine these 
parameters in more detail, 
starting with if 7 , the iden- 
tified gain on the visual cue 
7 . The identified K 1 was very 
close to the predicted value of 
10 eyeheight/rad for W[y,Q]- 
The difference in K, y between 
the Grid and Line display con- 
ditions was statistically differ- 
ent, with lower values being 
obtained with the grid dis- 
play. At the conclusion of 
the experiments, the opera- 
tors were asked for verbal de- 
scriptions of their visual cue 
strategies. Many of them in- 
dicated that they were try- 
ing to keep the distance be- 
tween the horizon and the 
line in the foreground con- 
stant (which would produce 
WV,e] = 1 eyeheight/rad), 
but that because of the motion 
of both lines they needed to 
also reference the foreground 
line (which would produce 
W^[r,@] = 10 eyeheight/rad). 

Values of AT less than 10 



could be consistent with an 
attention-sharing strategy be- 
tween the two cues, as indicated by the operators 


Figure 9. Direction of image motion resulting longitudinal motion and pitch 
rate. Longitudinal motion is shown in (a) and (c) for the Grid/Perpendicular, 
and Parallel/Line displays, respectively. Pitch motion is shown in (b) and (d) 
for the Grid/Perpendicular, and Parallel/Line displays. 


It appears that the richer scene content in the Grid 
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display provided operators with a greater ability to disregard the effects of pitch. 

Now we will consider Kp, the identified gain on the visual cue /3 in the pilot model. The paired t- 
test comparisons revealed that the differences in this parameter between each display pair was statistically 
significant. Notably, the mean value of Kp when using the Grid display is close to 2.6 eye heights/rad, 
the value associated with the optimal use of S (motion along a line-of-splay). When interviewed, many 
operators verbally indicated that they were attending to their peripheral vision to “quicken” their responses. 
Interestingly, one operator achieved values better than what could be achieved using S with both the Grid 
and Parallel displays. When interviewed, this operator indicated that he had indeed discovered and used 
the “optimal” cue H ; he was controlling the horizontal displacement of a grid intersection (only available in 
the Grid and Parallel displays) relative to the image frame. This strategy would correspond to a sensitivity 
parameter W[h,g\ °f unity, very close to the identified value of Kp for this operator. The identified values of 
Kp were lower when using the displays containing the angle-of-splay cue S, the Grid and the Parallel display. 
The fact that operators achieved values lower than that expected for using visual cue V could suggest that 
operators could perceive the motion between lines (AV) to some extent; even in the Line display, operators 
can use the displacement between the horizon and the line. The fact that values of Kp were consistently lower 
than Kr y indicate that the visual motion processing capabilities of the operators yielded better differentiation 
between the effects of rotation and translation. 

tq is another model parameter related to visual cue processing. This parameter is shown in Table 1 and 
Figure 6 b, and was statistically significant (p < 0.0001). This parameter describes a time advance in the use 
of pitch attitude 0 relative to the longitudinal position x. In all cases, pitch was acted upon slightly faster 
than longitudinal position. However, the amount of difference depended upon the presence of the lines-of- 
splay. Pitch time advance tq with the Grid and Perpendicular displays was higher than with the Parallel 
and Line, as shown in the pairwise comparisons in Table 2. These differences are likely due to significant 
differences in how the longitudinal and pitch displacements affect the different display types. Figure 9 
illustrates how longitudinal and pitch displacements affect the Grid and Perpendicular displays (a and b), 
and how these displacements affect the Parallel and Line displays (c and d). In the Grid/Perpendicular 
displays, longitudinal movement creates significant differences in displacement /optic flow magnitude and 
direction as a function of location in the image (Figure 9a). Pitch movement creates mostly uniform vertical 
displacement and optic flow (Figure 9b). In the Parallel/Line displays, the difference arising from the 
states is chiefly in the distribution of vertical motion (Figures 9c and 9d). It is likely that operators could 
disambiguate the effects of pitch and longitudinal motion more quickly with the Grid and Perpendicular 
displays, leading to increases in time advance. 

Some of the parameters related to the primary control task (control of longitudinal position x) were also 
statistically significant, although these were largely main effects of display condition rather than significant 
effects between display conditions. Crossover frequency ujc and phase angle 0 m are shown in Figure 6 c, 
and neuromuscular parameters ujn and (0v are shown in Figure 6 d. The crossover frequency was higher in 
the Grid and Perpendicular display conditions than with the Parallel and Line displays. It is likely that 
uncertainty in perceiving the controlled variable due to pitch variations caused operators to reduce their 
control gains. 


IV. Conclusions 

The Visual Cue Control Model has been shown to be highly accurate at describing the human operator 
characteristics in performing a longitudinal control task with a perspective display. The model incorporates 
parameters that can be directly related to the use of visual cues in the perspective display. The experimental 
validation showed that 1 ) operators used two different cue sources for position and velocity feedback; 2 ) 
identified parameters for the position feedback visual cue K 1 and the velocity visual cue Kp corresponded 
closely with the hypothetical visual cues present in the different display conditions; and 3) pitch attitude 
was reflected in the operator output more quickly than longitudinal position, with this differential time 
increasing with display complexity. Two of the model parameters associated with visual cue usage, Kp , and 
T 0 , demonstrated not only large main effects, but also significant differences as a function of display type 
in pairwise comparisons. Overall, the Grid and Perpendicular displays were associated with lower values of 
Kp, and higher values of tq, than the Parallel and Line displays. The presence of the lines of splay appear 
to enable better ‘rejection’ of the pitch attitude variations, as well as faster processing of pitch attitude. 

Although the ANOVA analysis demonstrated many significant main effects when averaged across opera- 


15 of 20 


American Institute of Aeronautics and Astronautics 



tors, operators sometimes self-reported very different visual cue usage strategies. A follow-on (Monte Carlo) 
analysis is planned to identify the statistical properties of the parameters for individuals, in order to study 
individual differences. 

The model shows great promise as a tool for determining optimum scene content, as well as development 
of training instructions for visually guided control tasks. This is evidenced by the fact that only one of the 
eight operators found and used one of the optimal visual cues. It should be possible to do a priori analysis of 
a task to determine the most effective cues, and use this as the basis for training. The model, when used to 
identify parameters in an experimental setting, was also useful in determining when a theoretically optimal 
cue was not being used effectively. This is an important feature for validation of perspective scene designs: it 
can be determined if the cues are being used as expected, and/or when a particular visual cue is ineffective. 

The Visual Cue Control Model is generally extensible to any manual control task in which a perspective 
display is used. This technique incorporates the well-validated Crossover Model, and additional parameters 
associated with the perspective scene and visual cues. These additional parameters are readily identified in 
an empirical setting, and can be used to verify the visual-cue usage of the human operator. 

Appendices 

A. General Perspective Projection Transformation 

The coordinate system (/, J, K) is inertial, assumed fixed in the world (see Fig. 10). The feature being 
imaged is located at D = Dxl + DyJ + D Z K, The operator eyepoint is located at P = XI + Y J + ZK. 
The orientation of the vehicle- fixed coordinate system (i,j,k), relative to the inertial coordinate system, is 
described by the Eulerian angles 34 T (heading), 0 (pitch), and <F (roll). The vehicle coordinate system can 
be obtained from the inertial coordinate system through three sequential body- fixed rotations: T about k , 
0 about j , and <f> about i. 

The transformation which relates the operator location and orientation (V,V,Z,<F,0,T), feature location 
(. Dx,Dy,Dz ), and focal length (L), to image coordinates, 35 is: 


h, = w 


(Dx — X) (— sin T cos <f> + cos T sin 0 sin <f>) 


+ ( Dy — Y ) (cos T cos <f> + sin T sin 0 sin <f>) + (D z — Z) cos 0 sin <f> 


(30) 


I v = w [(D x — X) (sin T sin <F + cos T sin 0 cos <F) 

+ (D y -Y)(- cos T sin <f> + sin T sin 0 cos <f>) + (Dz — Z) cos 0 cos 
where w is defined as: 


L 


w = 


(. D x ~ X) cos <f> cos 0 + ( D y — Y) sin <f> cos 0 — ( D z — Z) sin 0 


(31) 


(32) 


B. Visual Cue Sensitivity Parameter Derivation 

For the example task (see Fig. 3), in which the only vehicle degrees-of- freedom are longitudinal position and 
pitch attitude, and vertical position is set to Z = 1.0 eyeheight, the relationships defining the image-plane 
coordinates of a particular scene feature (Eqs. 30 through 32) simplify to: 


Ih = 


LD\ 


(Dx — V) sin 0 + cos 0 


(Dx — X ) sin 0 + cos 0 


Iv = 


(D x — X) cos 0 — sin 0 


(33) 


( 34 ) 
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Figure 10. Geometry of the imaging process, relating vehicle state variables and scene feature locations in 
world coordinates to image coordinates. 


in which L is the focal length, X is the longitudinal position and 0 is the pitch attitude of the operator’s 
vehicle reference frame, and Dx and Dy are the longitudinal and lateral locations, respectively, of a scene 
feature (all expressed in units of eyeheights). 


1. Vertical Location of a Feature in the Image 

The visual cue V is defined as the vertical position, in the image, of a feature located at a longitudinal 
position Dy in the world: 


V = I v = 


(Dx — X) sin 0 + cos 0 


(Dx — X) cos 0 — sin 0 
The sensitivity parameter is defined as (from Eq. 4): 


Wi 


_ dV/dO 
[v;e] “ WJdX 


x=x 0 ,e=® a 


The partial derivatives are: 

dV 


dX 


= L 


— sin 0 


cos 0 [(Dx — X) sin 0 + cos 0] 


(D x -X) cos0-sin© [(D x - X) cos0-sin0] 


9V 

d& 


= L 


1 


[(Dx — X) sin 0 + cos 0] ' 
[(Dx ~ X) cos © — sin ©] " 


Evaluating at the linearization conditions of X 0 = 0 and ©o = 0 yields: 

L 


dV_ 

dX 


x=o,e=o 


Dy 


(35) 

(36) 

(37) 

(38) 

(39) 
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dV 

<90 


(40) 


= L 

1+ 1 2 

x=0,©=0 

Dx \ 


The sensitivity parameter W[y,e] results from substituting equations 39 and 40 into 36: 

W lv , e] = D x 2 + 1 (41) 


2. Vertical Displacement between Two Features 

The visual cue AV is the vertical displacement between two features located at vertical positions I z \ and 
I z 2 in the image, and longitudinal positions D x \ and D X 2 in the world. Mathematically, AV is expressed as: 


AV — I v 2 — I v i — L 


(Dx 2 — X) sin 0 + cos 0 
(D x 2 — X) cos 0 — sin 0 


(D x l — X) sin 0 + cos 0 
(D x l ~ X) cos 0 — sin 0 


Application of Equation 4 will result in a sensitivity parameter W[ X v,e] = 1- 


(42) 


3. Horizontal Displacement Between Two Features 

The visual cue AH is defined as the horizontal displacement between any two features in the image, located 
at horizontal positions Ih and I v in the image, and positions (D X i, Dy i) and (D x 2 , Dy 2 ) in the world. AH 
is defined as: 


AH = I h 2 — I h i = L 


Dy2 

(D x 2 — X ) cos 0 — sin 0 




(D x 1 — X) cos 0 — sin 0 


The resulting sensitivity parameter is W[ X h,@\ = 1- 


(43) 


4 . Component of Motion along a Line-of-Splay 

A line-of-splay is the line, in the image, formed by lines in the world that are parallel to the operator’s 
direction of motion; several of these lines are indicated in Fig. 7. This cue consists of the component of 
displacement of a feature along a line-of-splay. This cue is considered because the motion of features, in the 
image, due to longitudinal motion of the vehicle, is along these lines-of-splay. The image motion resulting 
from pitch movement is largely vertical. Thus, if the operator attempts to control only motion that occurs 
along a line-of-splay, they would largely be controlling longitudinal motion. If the angle of the line-of-splay, 
relative to vertical, is defined to be <a, the component of that displacement parallel to the line-of-splay, 
defined as W, is: 


W = Ih sin a + I v cos a 


W = L 


Dy . (Dy — Y) sin 0 + cos 0 

sm a — — — cos a 


(D x — X ) cos 0 — sin 0 (D x — X) cos © — st 

The sine and cosine of a can also be expressed in terms of world coordinates: 


D y cos0/yT7 cos 2 0 + 
1/ \J -Dy cos 2 0 + 1 


1 


coso; = 

The sensitivity parameter for this cue can be shown to be: 


W[s,e] — 1 + 


D 2 X 


(D 2 y + 1) 


(44) 

(45) 

(46) 

(47) 

(48) 
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